Q1. Load the "titanic" dataset using the load_dataset function of seaborn. Use Plotly express to plot a scatter plot for age and fare columns in the titanic dataset.
Ans.
#importing the required libraries
import seaborn as sns
import plotly.express as px
# Load the titanic dataset
titanic_data = sns.load_dataset("titanic")
titanic_data.shape
(891, 15)
titanic_data.head()
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
# Create the scatter plot using Plotly Express
fig = px.scatter(titanic_data, x="age", y="fare")
# Show the scatter plot
fig.show()
Q2. Using the tips dataset in the Plotly library, plot a box plot using Plotly express.
Ans.
import plotly.express as px
# Load the tips dataset
tips_data = px.data.tips()
tips_data.shape
(244, 7)
tips_data.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
# Create the box plot using Plotly Express
fig = px.box(tips_data, x="day", y="total_bill")
# Show the box plot
fig.show()
Q3. Using the tips dataset in the Plotly library, Plot a histogram for x= "sex" and y="total_bill" column in the tips dataset. Also, use the "smoker" column with the pattern_shape parameter and the "day" column with the color parameter.
Ans.
import plotly.express as px
# Load the tips dataset
tips_data = px.data.tips()
tips_data.shape
(244, 7)
tips_data.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
# Create the histogram using Plotly Express
fig = px.histogram(tips_data, x="sex", y="total_bill", color="day", pattern_shape="smoker")
# Show the histogram
fig.show()
Q4. Using the iris dataset in the Plotly library, Plot a scatter matrix plot, using the "species" column for the color parameter. Note: Use "sepal_length", "sepal_width", "petal_length", "petal_width" columns only with the dimensions parameter.
Ans.
import plotly.express as px
# Load the iris dataset
iris_data = px.data.iris()
iris_data.shape
(150, 6)
iris_data.head()
| sepal_length | sepal_width | petal_length | petal_width | species | species_id | |
|---|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 1 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 1 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 1 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 1 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 1 |
# Create the scatter matrix plot using Plotly Express
fig = px.scatter_matrix(iris_data, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], color="species")
# Show the scatter matrix plot
fig.show()
C:\Users\swati\anaconda3\lib\site-packages\plotly\express\_core.py:279: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
Q5. What is Distplot? Using Plotly express, plot a distplot.
Ans.
A Distplot or distribution plot, depicts the variation in the data distribution. Seaborn Distplot represents the overall distribution of continuous data variables. The Seaborn module along with the Matplotlib module is used to depict the distplot with different variations in it. The Distplot depicts the data by a histogram and a line in combination to it.
import plotly.express as px
import numpy as np
# Create some data
data = np.random.randn(1000)
# Create the distplot
fig = px.histogram(data, marginal="rug")
# Add a title
fig.update_layout(title_text="Distribution of Data")
# Show the plot
fig.show()
import plotly.express as px
import numpy as np
import scipy.stats as stats
# Create a sample dataset
np.random.seed(2)
data = np.random.randn(1000)
# Create the histogram
fig = px.histogram(data, nbins=30, histnorm='probability density')
# Add the KDE curve
x_values = np.linspace(data.min(), data.max(), 100)
y_values = stats.gaussian_kde(data).pdf(x_values)
fig.add_scatter(x=x_values, y=y_values, mode='lines', name='KDE')
# Update the layout
fig.update_layout(title='Distribution Plot')
# Show the plot
fig.show()